specs/env.md — Envspecs/rl.md — Rlspecs/training.md — Trainingspecs/proof.md — Proofspecs/dashboard.md — DashboardGenerated from git diff by delegated diff-reading agent. Code diffs are ground truth.
Infrastructure: Factory convergence loop improvements: crank lifecycle documentation in SKILL.md, hard convergence gate (ALL scenarios must pass), fix-forward commit policy for review pack rendering. Review pack renderer gains </script> escaping, dynamic SVG viewBox, zoom controls. 8 new holdout scenarios (13-20) added for pong interface behaviors. Codex prompt template updated with score_limit protected file.
Product: New interactive play module (src/play/) with two-player same-keyboard controls, agent takeover toggle, player status tags, and configurable score-limit rallies. Environment gains score_limit config for multi-rally mode (default 1 preserves RL training). Makefile adds play, play-debug, play-agent-vs-agent targets. pygame added to dependencies. 3 new test files covering play module, score-limit logic, and env smoke tests.
Major factory iteration: 10 files changed across skills, prompts, docs, and CLAUDE.md. Factory orchestrator SKILL.md gains crank lifecycle states (IN PROGRESS / CONVERGED / COMPLETE) and hard convergence gate — ALL scenarios must pass, no percentage threshold. Review pack renderer (render_review_pack.py) gains </script> content escaping, dynamic SVG viewBox calculation, and zoom controls. Template updated with visual inspection banner. CLAUDE.md adds fix-forward principle and factory orchestration hard rules. 8 new holdout scenarios validate play module behaviors. New spec pong_interfaces.md defines interactive play requirements. Factory feedback artifacts from iterations 0-1 and post-merge tracking included.
src/envs/minipong.py gains score_limit config parameter and multi-rally logic — when score_limit > 1, points reset the ball instead of ending the episode. Cumulative scores tracked across rallies. Default score_limit=1 preserves existing RL training behavior. New src/play/ module (2 files, +213 lines) implements interactive pygame-based play: two-player same-keyboard controls (Q/A left, P/L right), agent takeover via Shift+A/Shift+L, player status tags, HUD score display, and game flow (ESC quit, R restart). Agent receives horizontally flipped observation when controlling right side.
3 new test files: test_play_minipong.py (+48 lines) tests GameController key mapping, takeover toggle, and status tag accuracy. test_env_minipong_score_limit.py (+44 lines) tests multi-rally continuation and score_limit termination. test_env_minipong_smoke.py (+28 lines) expanded with additional env smoke validations.
Makefile gains 3 new targets: play, play-debug, play-agent-vs-agent — convenience wrappers for the interactive play module. requirements.in adds pygame dependency. requirements.txt updated via pip-compile with pygame and transitive dependencies.
| File | Agents | Zone | Notable |
|---|---|---|---|
src/play/play_minipong.py |
MAB+ | environment | Well-structured play module, but GameController class is large (208 lines in one file) |
MAB+main The play module implements the full spec correctly: two-player controls, agent takeover, status tags, score-limit rallies, and game flow. The GameController class handles input mapping, agent toggle, observation flipping, and rendering all in one file. While functional and tested, a future refactor could separate input handling from rendering. The agent checkpoint loading with random-policy fallback is a good defensive pattern. No gaming or hardcoded lookup tables detected. |
|||
src/envs/minipong.py |
MAA | environment | Clean score_limit addition preserves backward compatibility |
MAAmain The score_limit parameter defaults to 1, perfectly preserving existing RL training behavior while enabling multi-rally mode for interactive play. The implementation correctly resets ball position and velocity using the episode RNG after each point, maintaining determinism. Cumulative score tracking is clean. The change is minimal (+46/-8) and surgical. |
|||
tests/test_play_minipong.py |
MAB+ | tests | Tests exercise real GameController, no mocking of system under test |
MAB+main Tests instantiate the actual GameController and verify key mappings, takeover toggle, and status tag content. No mocking of the system under test — tests call real methods and assert real outputs. Coverage includes both human and agent control states. Could benefit from edge-case tests (e.g., simultaneous key presses, rapid toggle sequences) in a follow-up. |
|||
tests/test_env_minipong_score_limit.py |
MAA | tests environment | Anti-vacuous test: runs real env to score_limit, validates episode_reason |
MAAmain Tests create a real MiniPongEnv with score_limit=11, step through until termination, and verify multiple points were scored and episode_reason == 'score_limit'. No stubs, no mocking the env. The test proves multi-rally mode works end-to-end. |
|||
tests/test_env_minipong_smoke.py |
MAB+ | tests environment | Extended smoke test covers observation shape, dtype, and basic stepping |
MAB+main Expands the existing smoke test with additional assertions. Tests use real env instances, no mocking. Clean and focused. |
|||
specs/pong_interfaces.md |
MAA | factory | Comprehensive spec covers controls, rendering, scoring, agent takeover, status tags |
MAAmain The spec is thorough and unambiguous: key mappings in a table, rendering requirements with specific pixel sizes, scoring rules with configurable limit, agent takeover controls, status tag content for all states, and Makefile targets. This is a well-written spec that leaves minimal room for misinterpretation. |
|||
Makefile |
MAB+ | config | Clean Makefile additions for play targets |
MAB+main Three new targets ( play, play-debug, play-agent-vs-agent) follow the existing Makefile pattern. Commands use python -m src.play.play_minipong module invocation, consistent with the project convention. |
|||
.claude/skills/factory-orchestrate/SKILL.md |
MAA | factory | Crank lifecycle and hard convergence gate are critical process improvements |
MAAmain Adds three-state crank lifecycle (IN PROGRESS / CONVERGED / COMPLETE) and removes the percentage-based satisfaction threshold in favor of ALL-scenarios-must-pass. This is a significant process hardening — it prevents partial convergence from being declared as success. The fix-forward principle is also codified here. |
|||
.claude/skills/pr-review-pack/* |
MAB+ | factory | Review pack renderer fixes: script escaping, dynamic viewBox, zoom controls |
MAB+main The renderer gains three important fixes: (1) </script> in embedded content is escaped to <\/script to prevent HTML parsing breakage, (2) SVG viewBox is calculated dynamically from zone positions instead of hardcoded, (3) zoom controls (+/-/Fit) added to template. These are systemic fixes that prevent recurring rendering issues. |
|||
docs/pr9_review_pack.html |
MAN/A | factory | Generated artifact — not reviewed for code quality |
MAN/Amain This is a generated HTML review pack (3710 lines). It is an output artifact of the review pack renderer, not hand-written code. Excluded from adversarial review. |
|||
docs/pr9_diff_data.json |
MAN/A | factory | Generated artifact — deterministic diff output |
MAN/Amain Generated by Pass 1 diff collection script. Contains raw diffs and file contents for all 34 changed files. Not hand-written code. |
|||
artifacts/factory/* |
MAN/A | factory | Factory iteration artifacts — feedback from iterations 0, 1, and post-merge |
MAN/Amain Three feedback markdown files tracking the factory convergence loop: iteration 0 seed feedback, iteration 1 fixes, and post-merge follow-up. These are process artifacts, not product code. |
|||
| Check | Status | Time | |
|---|---|---|---|
| factory-self-test (push) | pass | 17s normal |
▼ |
|
Coverage: factory-self-test job Gates: Zones:
|
|||
| validate (push) | fail | 2m 10s acceptable |
▼ |
|
Coverage: validate job Gates: Zones:
|
|||
| validate (push) | fail | 2m 16s acceptable |
▼ |
|
Coverage: validate job Gates: Zones:
|
|||
| factory-loop (push) | pass | 4m 57s acceptable |
▼ |
|
Coverage: factory-loop job Gates: Zones:
|
|||
| factory-self-test (push) | pass | 20s normal |
▼ |
|
Coverage: factory-self-test job Gates: Zones:
|
|||
Thresholds: ✓ under 1m = normal • ○ 1-5m = acceptable • ⚠ 5-10m = watch • ✖ over 10m = needs refactoring
Rather than extending the existing environment with interactive rendering, a separate src/play/ module was created. This keeps the RL training environment clean (headless, single-agent, score_limit=1) while the play module handles human interaction, two-player controls, agent takeover, and multi-rally scoring. The play module imports from src.envs.minipong for game physics and from the RL modules for agent policies.
| File | Change |
|---|---|
specs/pong_interfaces.md | New spec defining interactive play requirements |
src/play/__init__.py | New module init |
src/play/play_minipong.py | Interactive game loop with GameController |
Makefile | Added play, play-debug, play-agent-vs-agent targets |
The score_limit parameter was added to MiniPongConfig with a default of 1, meaning existing code (training, evaluation, all 12 original scenarios) sees no behavioral change. Only the interactive play module sets score_limit=11 for multi-rally games. This is a classic additive-only API extension — new behavior is opt-in.
| File | Change |
|---|---|
src/envs/minipong.py | Added score_limit to MiniPongConfig, multi-rally logic in step() |
specs/env.md | Documented score_limit config and multi-rally mode |
tests/test_env_minipong_score_limit.py | Tests verifying multi-rally behavior |
When a trained DQN agent takes over the right paddle, the 84x84 observation is horizontally flipped before being fed to the policy. This ensures the agent always perceives itself as the left paddle, matching its training perspective. Without this flip, a trained checkpoint would make incorrect decisions when playing from the right side.
| File | Change |
|---|---|
src/play/play_minipong.py | Horizontal flip via np.flip for right-side agent observations |
The factory orchestrator SKILL.md was updated to remove the satisfaction percentage threshold. Convergence now requires ALL scenarios passing — a single failure blocks convergence regardless of cause (regression, merge conflict, pre-existing). This was learned from a near-miss where scenario 7 was failing pre-merge but the percentage threshold would have allowed it through.
| File | Change |
|---|---|
.claude/skills/factory-orchestrate/SKILL.md | Removed satisfaction threshold, added ALL-pass requirement |
CLAUDE.md | Added factory orchestration hard rules section |
Three systemic renderer fixes: (1) </script> in embedded file content breaks HTML parsing — now escaped as <\/script. (2) SVG viewBox was hardcoded — now dynamically calculated from zone positions via _calculate_viewbox(). (3) Added zoom controls (+/-/Fit) to the architecture diagram. These are fix-forward changes: instead of patching individual review packs, the renderer itself was fixed so all future packs are correct.
| File | Change |
|---|---|
.claude/skills/pr-review-pack/scripts/render_review_pack.py | Added _escape_script_closing(), _calculate_viewbox(), marker validation |
.claude/skills/pr-review-pack/assets/template.html | Added zoom controls, visual inspection banner |
.claude/skills/pr-review-pack/SKILL.md | Updated with rendering fix documentation |
Scenarios 13-20 were added as holdout evaluation criteria for the pong interfaces spec. They cover: play module importability, Makefile targets, two-player keyboard controls, agent takeover toggle, player status tags, continuous multi-rally play, right-side observation flip, and pygame dependency. These scenarios are factory-protected — the coding agent (Codex) never sees them.
| File | Change |
|---|---|
scenarios/13_play_module_imports.md | Validates play module imports |
scenarios/14_play_makefile_targets.md | Validates Makefile play targets |
scenarios/15_two_player_controls.md | Validates keyboard control mapping |
scenarios/16_agent_takeover_toggle.md | Validates Shift+A/L toggle |
scenarios/17_player_status_tags.md | Validates status tag accuracy |
scenarios/18_continuous_play.md | Validates multi-rally mode |
scenarios/19_agent_observation_flip.md | Validates right-side obs flip |
scenarios/20_pygame_dependency.md | Validates pygame in requirements.in |
src/play/ to zone registry
The new src/play/ module does not match any existing zone in .claude/zone-registry.yaml. Currently the play files are orphaned from the architecture diagram. They should either get their own zone or be added to the environment zone's paths.
The GameController class in src/play/play_minipong.py (208 lines) handles input mapping, agent toggle, observation flipping, score display, status tags, and the main game loop all in one class. A future refactor could separate concerns.
Every scenario that produces videos logs IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (84, 84) to (96, 96). While not a functional issue, the warnings clutter scenario output.
| Phase | Gate 1 | Gate 2 | Gate 3 | Action |
|---|---|---|---|---|
| Iter 1 (Codex) | pass | pass | 12/20 | Gate 0 blocked: code merged (not reverted), feedback compiled for iter 2 |
| Iter 2 (Codex) | pass | pass | 20/20 | Converged: all gates pass, PR created |
| Validation + Fix-Forward | pass | pass | 20/20 | Review pack generated, fix-forward renderer improvements committed |